Semi-Supervised Learning in Gigantic Image Collections

نویسندگان

  • Rob Fergus
  • Yair Weiss
  • Antonio Torralba
چکیده

With the advent of the Internet it is now possible to collect hundreds of millions of images. These images come with varying degrees of label information. “Clean labels” can be manually obtained on a small fraction, “noisy labels” may be extracted automatically from surrounding text, while for most images there are no labels at all. Semi-supervised learning is a principled framework for combining these different label sources. However, it scales polynomially with the number of images, making it impractical for use on gigantic collections with hundreds of millions of images and thousands of classes. In this paper we show how to utilize recent results in machine learning to obtain highly efficient approximations for semi-supervised learning that are linear in the number of images. Specifically, we use the convergence of the eigenvectors of the normalized graph Laplacian to eigenfunctions of weighted Laplace-Beltrami operators. Our algorithm enables us to apply semi-supervised learning to a database of 80 million images gathered from the Internet.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Extracting Structures in Image Collections for Object Recognition

Many computer vision methods rely on annotated image sets without taking advantage of the increasing number of unlabeled images available. This paper explores an alternative approach involving unsupervised structure discovery and semi-supervised learning (SSL) in image collections. Focusing on object classes, the first part of the paper contributes with an extensive evaluation of state-of-the-a...

متن کامل

Finding Centuries-Old Hyperlinks: a Novel Semi-Supervised Shape Classifier

Hyperlinks are so useful for searching and browsing modern digital collections that researchers have longer wondered if it is possible to retroactively add hyperlinks to digitized historical documents. There has already been significant research into this endeavor for historical text; however, in this work we consider the problem of adding hyperlinks among graphic elements. While such a system ...

متن کامل

SSL-QA: Analysis of Semi-Supervised Learning for Question- Answering

Open domain natural language question answering (QA) is a process of automatically finding answers to questions searching collections of text files. Question answering (QA) is a long-standing challenge in NLP, and the community has introduced several paradigms and datasets for the task over the past few years. These patterns differ from each other in the type of questions and answers and the si...

متن کامل

Top-down Analysis of Low-level Object Relatedness Leading to Semantic Understanding of Medieval Image Collections

The aim of image understanding, which is a long standing goal of computer vision, is to develop algorithms with which computers can advance to the semantic content of images. One ability of such algorithms would be the automatic discovery of relations between different objects in large collections of images. To analyze this relatedness we present an unsupervised and a semi-supervised approach f...

متن کامل

Deep Learning Neural Network with Semi supervised Segmentation for Predicting Retinal and Cancer Cell Diseased

In medical field, diagnosis of diseases competently carried out by using the image processing. So that to retrieve the relevant data from the amalgamation of resulting image is too difficult. Here the segmentation done by semi supervised learning then the result is tuned by using Deep Learning Neural Network. Higher tuning of results will leads to efficient detection of disease. The experiment ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009